Implementing Cross-Language Text Retrieval Systems for Large-scale Text Collections and the World Wide Web

نویسندگان

Mark W. Davis

William C. Ogden

چکیده

QUILT (Query User Interface with Light Translations) is prototype implementation of a complete cross-language text retrieval system that takes English queries and produces English gloss translations of Spanish documents. The system indexes the Spanish documents in Spanish, but converts the English query into a Spanish equivalent set through a novel combination of lexical methods and parallel-corpus disambiguatinn. Similar methods are applied to the returned document o produce a simple translation that can be examined by non-Spanish speakers to gauge the relevance of the document to the original English query. The system integrates traditional, glossary-based machine txanslation technology with information retrieval approaches and demonstrates that relatively simple term substitution and disambiguation approaches can he viable for cross-language text retrieval. Components of QUILT have been used to build a CLTR interface to WWW-based search services.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ایجاز:یک سامانه عملیاتی برای خلاصه‌سازی تک‌سندی متون خبری فارسی

The rapid growth of published documents on the web has created some new requests for processing, classification and information retrieval. So, the use of natural language processing tools has increased around the world. Automatic summarization known as the core of a wide range of text-processing tools such as decision systems, accountability systems, search engines, etc. And always has been inv...

متن کامل

Cortina: A System for Large-scale, Content-based Web Image Retrieval and the Semantics within

Recent advances in processing and networking capabilities of computers have led to an accumulation of immense amounts of multimedia data such as images. One of the largest repositories for such data is the World Wide Web. There is an urgent need for systems which allow to search these vast on-line collections. We present Cortina, a large-scale image retrieval system for the World Wide Web. It h...

متن کامل

Knowledge discovery in the Internet

With the rapid expansion of the World Wide Web, the need for efficient data retrieval strategies becomes stronger and will be still growing. Unfortunately classical information retrieval techniques, developed for well-organized collections of textual data do not seem to be able to cope with diversity and amount of information available throughout the Internet. This paper presents some of the ne...

متن کامل

Exploiting the Web as Parallel Corpora for Cross- Language Information Retrieval

The expansion of the Web creates more requirements for Cross-Language Information Retrieval (CLIR). Query translation is the key problem. Previous studies have shown that query translation can be done by exploiting a large set of parallel texts. However, the problem arisen is the unavailability of large parallel corpora for many languages. In this paper, we describe a mining system that automat...

متن کامل

Information Retrieval on the Web

For the information retrieval (IR) community, the Web now presents a new paradigm, while also generating new challenges and attracting growing interest from around the world. An important example of these challenges is managing huge text collections and evaluating the usefulness of hyperlinks contained within them.

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2002

Implementing Cross-Language Text Retrieval Systems for Large-scale Text Collections and the World Wide Web

نویسندگان

چکیده

منابع مشابه

ایجاز:یک سامانه عملیاتی برای خلاصه‌سازی تک‌سندی متون خبری فارسی

Cortina: A System for Large-scale, Content-based Web Image Retrieval and the Semantics within

Knowledge discovery in the Internet

Exploiting the Web as Parallel Corpora for Cross- Language Information Retrieval

Information Retrieval on the Web

عنوان ژورنال:

اشتراک گذاری